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PART 1 


This part of the paper carries 50% of the total marks. 

You should attempt ALL the questions in this part of the 
examination. 

You should note that for some of these questions you may be 
required to select more than one answer from the options given. 
All such questions include an instruction like ‘You should choose 
TWO options for this question’. 

Record your answers in pencil on the CME form provided 
according to the instructions below. (Allow some time to check 
that your selections have been correctly entered on the CME 


form.) 


Instructions for completing the computer-marked 
examination (CME) form 


1. 
2. 


You will find one CME form provided with this paper. 

You should use a pencil to make entries on the CME form. 
If you make any smudges or other marks on the form that you 
cannot cancel out clearly, then you should ask the invigilator for 
a new form, and transfer your entries onto the new form. 

The CME form consists of Part 1 (see point 4 below) and Part 2 
(see point 5 below). 

In Part 1 of the CME form, write your name and personal 
identifier (not your examination number), and the assignment 
number for the examination, that is, M248 81. 

In Part 2 of the CME form, record your answers to Questions 1 
to 30. 

Please note that for each question you should pencil across either 
the required number of cells or the ‘don’t know’ cell (denoted by 
aT 

If you think that a question is unsound in any way, pencil across 
the ‘unsound’ cell (U) in addition to pencilling across either an 
answer cell or the ‘don’t know’ cell. 

Please note that you will not be allowed extra time at the end of 
the examination to fill in your CME form. 


Failure to follow the above instructions may mean that we 
are unable to identify your work and award a mark for 
Part 1 of the examination. 
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Question 1 
A sample of data consists of the following values. 
1.01 1.07 1.11 1.11 1.13 1.14 1.26 1.74 
Select the TWO options that give the values of (i) the lower quartile, 
and (ii) the upper quartile. [2] 
Options for Question 1 
You should choose TWO options for this question 
A 1.12 B 1.055 C 1.08 D 1.14 
E 1.07 F 1.23 G 1.1525 H 1.11 


Question 2 
A sample of data consists of the following values. 


21 22 22 24 26 26 26 28 31 32 32 
35 36 38 39 39 40 41 44 56 60 


Given that the upper quartile and interquartile range for these data 
are 39.5 and 13.5, respectively, select the option that gives the upper 
adjacent value. [2] 


Options for Question 2 
A 44 B 56 C 59 
D 59.75 E 60 F 66.5 
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Question 3 


Suppose that a sample is drawn on a random variable X that has the 
probability density function f(x) shown in Figure 1. 


f(x) 








Figure 1 
Select the THREE correct statements from the following options. 


Options for Question 3 
You should choose THREE options for this question 


A The sample mean is likely to be greater than the sample median 
since the mean is generally greater than the median for 
right-skew data. 


B The sample mean is likely to be less than the sample median since 
the mean is generally less than the median for right-skew data. 


C The sample mean is likely to be greater than the sample median 
since the mean is generally greater than the median for left-skew 
data. 


D The sample mean is likely to be less than the sample median since 
the mean is generally less than the median for left-skew data. 


E The mean is a more suitable measure of location for the sample 
than the median. 


F The median is a more suitable measure of location for the sample 
than the mean. 


G The standard deviation is a more suitable measure of dispersion 
for the sample than the interquartile range. 


H The interquartile range is a more suitable measure of dispersion 
for the sample than the standard deviation. 
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Questions 4, 5, 6 and 7 


A factory operates three shifts — Day, Swing and Night. Some of the 
accidents that occur at the factory can be attributed at least in part 


to unsafe working conditions; others cannot. The table below shows 
the numbers of accidents of each type that occurred in each shift 
during a year. 


Numbers of accidents during a year 


Attributable to unsafe conditions 


Shift Yes No Total 
Day 20 70 90 
Swing 16 AO 56 
Night 10 44 54 
Total 46 154 200 


4 Choose the option that gives, to two decimal places, the 
proportion of accidents in the factory that were attributable to 
unsafe working conditions. 


Options for Question 4 
A 0.10 B 0.77 C 0.08 
D 0.05 E 0.23 F 0.30 


5 Choose the option that gives, to two decimal places, the 
proportion of accidents that occurred on the night shift. 


Options for Question 5 

A 0.27 B 0.45 C 0.19 

D 0.28 E 0.37 F 0.05 

6 Choose the option that gives an estimate, obtained using the 


data, of the probability that an accident that occurred on the 
night shift was due to unsafe working conditions. 


Options for Question 6 
A 0.217 B 0.851 C 0.185 
D 0.227 E 0.230 F 0.050 


7 Choose the option that gives an estimate, obtained using the 


data, of the probability that an accident that was attributable to 


unsafe working conditions occurred on the day shift. 


Options for Question 7 
A 0.100 B 0.450 C 0.286 
D 0.511 E 0.222 F 0.435 
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Questions 8, 9, 10 and 11 


Four variables are described below. 


X 


Y 


The number of wheel bearings, out of four, that are found to be 
damaged when a car has completed a long test run on a test track 
The length of the right antenna of an aphid measured when 
aphids are studied by an ecologist 

The number of traffic accidents that occur on a randomly chosen 
day on a particular stretch of motorway 

The waiting time between arrivals of telephone calls at a 
customer service department 


Choose the option that gives the distribution that would be a 
reasonable initial probability model for variable X. 


Options for Question 8 


A 
D 


9 


Bernoulli B Binomial C Geometric 


Poisson E Exponential F Normal 


Choose the option that gives the distribution that would be a 
reasonable initial probability model for variable Y. 


Options for Question 9 


A 
D 


Bernoulli B Binomial C Geometric 


Poisson E Exponential F Normal 


10 Choose the option that gives the distribution that would be a 


reasonable initial probability model for variable Z. 


Options for Question 10 


A Bernoulli B Binomial C Geometric 


D Poisson E Exponential F Normal 


11 Choose the option that gives the distribution that would be a 


reasonable initial probability model for variable U. 


Options for Question 11 


A Bernoulli B Binomial C Geometric 


D Poisson E Exponential F Normal 
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Question 12 


A discrete random variable Y has the probability distribution given in 
the table below. 


N 
BIR c 


ele = 
oola 


Choose the option that gives the mean of X. 
Options for Question 12 


A 1i B i C 1 D 1 


AIW 
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Question 13 


A discrete random variable X has the probability distribution given 
in the table below. 


8 
mÓ 
al © 
nie = 


Given that the mean of X is L choose the option that gives the 
variance of X. 


Options for Question 13 

1 29 2 5 2 5 
Ag B x C = D È E 3 F > 
Question 14 


The reliability of an electrical fuse is the probability that a randomly 
selected fuse will function under the conditions for which it has been 
designed. The reliability of a particular type of fuse is known to 

be 0.98. Assume independence from fuse to fuse. 


Select the TWO options that give (i) the mean and (ii) the variance 
of the normal distribution that can be used to calculate the 
approximate value for the probability of observing 27 or more 
defective fuses in a random sample of 1000 fuses. 


Options for Question 14 
You should choose TWO options for this question 
A 0.28 B 0.53 C 0.54 D 19.6 


E 20 F 26.46 G 384.16 H 980 
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Questions 15, 16, 17 and 18 


The arrival of incoming telephone calls at an insurance office during 
office hours may be modelled by a Poisson process. On average, four 
calls arrive per hour. 


15 Choose the option that gives the probability that exactly two 
calls arrive in an hour. 


Options for Question 15 
A 0.147 B 0.023 C 0.037 
D 0.090 E 0.271 F 0.012 


16 Choose the option that gives the probability that at least two 
calls arrive in an hour. 


Options for Question 16 
A 0.594 B 0.762 C 0.323 
D 0.927 E 0.908 F 0.238 


17 Choose the option that gives the probability that the interval 
between two successive incoming calls is less than ten minutes. 


Options for Question 17 
A 0.105 B 0.513 C 0.487 
D 0.811 E 0.999 F 0.895 


18 Choose the option that gives the distribution of the number of 
calls arriving between 9.00 am and 11.00 am. 


Options for Question 18 
A M(4) B  Poisson(4) C M(2) 
D Poisson(2) E M(&8) F  Poisson(8) 


Question 19 


The random variable Z has a geometric distribution G(0.7). Choose 
the option that gives the probability P(Z « 5). 


Options for Question 19 
A 0.0720 B 0.1029 C 0.2401 
D 0.7599 E 0.8319 F 0.9919 
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Question 20 


Bathroom tiles of a certain type are sold in packs of 100. The 
approximate distribution of the weight (in grams) of a pack of 

100 tiles is a normal distribution with mean 9900 grams and standard 
deviation 50 grams. 


Choose the option that gives an approximate value for the proportion 
of 100-tile packs that weigh more than 10 kilograms. [2] 


Options for Question 20 
A 0.0228 B 0.4207 C 0.1587 
D 0.0051 E 0.3446 F 0.025 


Question 21 


The mean weight of passengers on a certain route of an airline 
company is 70 kg with a variance of 64 kg. The number of passengers 
on a typical flight on that route is 100. 


Choose the option that gives the approximate probability that on a 
typical flight on that route, the total weight of passengers will not 
exceed 7200 kg. [2] 


Options for Question 21 
A 0.994 B 0.488 C 0.006 
D 0.512 E 0.01 F 0.450 


Question 22 


The random variable X has an exponential distribution with 
parameter A = 2. 


Select the option that gives the variance of W = 4X. [2] 
Options for Question 22 
A 2 B 4 C i D 


ES Fi G 16 H 


= ele 


Question 23 


The independent random variables X and Y have chi-squared 
distributions with 5 and 3 degrees of freedom, respectively. 


Select the option that gives the standard deviation of U = X — Y. [2] 
Options for Question 23 

A2 B4 Cc v2 D9 

E 8 F V2 G 16 H 32 
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Question 24 


In a study of pregnancy and childbirth, the birth weight of each of a 
random sample of 53 newborn babies was recorded. Based on these 
data, a 9596 confidence interval for the mean birth weight of babies 
was found to be (6.87, 7.45) pounds. You may assume that 1 pound is 
equivalent to 454 grams. 


Choose the option that gives the corresponding 9596 confidence 
interval for the mean birth weight in grams. 


Options for Question 24 

A (6.87, 7.45) B (3118.98, 3382.30) 

C (0.0151, 0.0164) D (58.85, 63.82) 

E (428.425, 464.595) F (165 305.94, 179 261.90) 


Question 25 


In a large maternity hospital, records are kept of the number of 
babies born with a particular malformation. Assume that births are 
independent and that the probability of the malformation is the same 
for all births. Out of 3000 successive live births, six babies had the 
malformation. 


Choose the option that gives the upper limit of an approximate 95% 
confidence interval for the proportion of babies born with the 
malformation. 


Options for Question 25 
A 0.0033 B 0.0041 C 0.0039 
D 0.0020 E 0.0036 F 0.0070 


Question 26 


A random sample of fifteen observations is collected on a normally 
distributed random variable X. The sample mean is 12.5, and the 
sample standard deviation is 4.4. 


Choose the option that gives the lower limit of an exact 90% 
confidence interval for u, the underlying population mean. 


Options for Question 26 
A 11.0 B 8.2 C 10.5 
D 10.1 E 9.5 F 14.5 
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Question 27 


A researcher is planning an experiment to test the null hypothesis 
that cakes produced by a new patented industrial process have the 
same volume as those produced by the company's existing process. 
Long experience of the existing process has shown that the mean 
volume of cakes produced by the existing process is 88 standard units, 
with a standard deviation of 5 standard units. The researcher is 
prepared to assume that the volumes of cakes produced by the new 
process will have a normal distribution with the same standard 
deviation as before, so o = 5. The data will be analysed using a 
two-sided test with significance level 596. 'The experiment is to have 
power 80% of finding a difference d in mean volume of 2 units. 


Choose the option that gives the number of cakes that should be 
measured. [2] 


Options for Question 27 
A 50 B 49 C 39 
D 84 E 8 F 20 


Question 28 


A linear regression model is to be fitted to data. The following 
summary statistics are calculated: 


n=8, X xri=26, X uus 180, 
Xx? 2127, Sy = 4347, Y qiyi = 598. 
Choose the TWO options that give (i) Sss and (ii) Soy. [2] 
Options for Question 28 
You should choose TWO options for this question 
A 660.13 B 297 C 4605.25 D 602.5 
E 4331.13 F 13 G 549 H 42.5 


Question 29 


A linear regression model is to be fitted to data. The following 
summary statistics are calculated: 


T=15, J=11, $,—452, Syy =187, Spy = 284. 


Choose the option that gives the least squares estimate of the slope of 
the fitted regression line. [1] 


Options for Question 29 
A 1.58 B 1.59 C —12.88 
D 1.52 E 0.63 F 0.41 
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PART 2 


e This part of the paper carries 50% of the total marks. 
You should attempt ALL the questions in this part of the 
examination. 


Throughout Part 2 you should show all your working. 


e Throughout Part 2 you should write your answer in pen in the 
space provided below each question. 


Question 30 


Give two advantages that boxplots possess for the graphical summary 
of data, as compared with histograms. [2] 


Question 31 


The two histograms in Figure 2 represent the same data set, namely 
63 measurements of annual snowfall (in inches) at Buffalo, New York, 




















































































































USA. Why do the two histograms differ? [1] 
Frequency Frequency 
204 20 
154 154 
104 104 
54 54 
T T 1 rL-— — T 
0 50 100 150 0 50 100 150 
Snowfall (inches) Snowfall (inches) 
Figure 2 
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Question 32 


Explain why the following function is not a valid probability mass 
function: 


p(r)—-1(4—2) z-—1,2,3,4,5. [1] 


Question 33 


A random sample of 529 men were questioned as part of a survey of 
drinking habits. In the survey, each man stated the number of units 
of alcoholic drinks that he consumed in the previous week. Based on 
these data, the mean number of units consumed by men in the 
previous week was 18.2, with 90% confidence interval (15.7, 20.7). 


(a) Interpret this confidence interval in terms of repeated 
experiments. [1] 


(b) Interpret this confidence interval in terms of plausible ranges. [2] 
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Question 34 


In a study, a researcher calculated that a 95% confidence interval for 
the population mean ju (based on a t-distribution) was (2.58, 2.98). 


(a) What would be the result of a fixed-level hypothesis test (based 
on the same ¢-distribution) with hypotheses Ho: y = 3, Hi: p Æ 3, 
using a 596 significance level? [2] 


(b) Another researcher studied the same data and claimed that the 
p value for a significance test (based on the same t-distribution) 
with the same hypotheses as in part (a) was 0.15. Why must the 
result of (at least) one of the researchers be in error? [1] 
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Question 35 


A sample of size 20 was drawn from population A and a sample of 
size 30 was drawn from population B. The comparative boxplots in 
Figure 3 represent the data in the two samples. 



































Figure 3 


(a) Are the samples right-skew or left-skew? 


(b) In one or two sentences, compare the distributions of values in 
the two samples. 


(c) Why would it be inappropriate to use a two-sample t-test with 
these data? 
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(d) A (two-sided) Mann-Whitney test is carried out using these data. 
(1) State the null hypothesis of this test. [1] 


(ii) The value of the test statistic is u4 = 646. Find an 
approximate value for the p value of the test. [3] 


(iii) What can you conclude from this test? [2] 
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Question 36 


The data in the table below are measurements on two (non-matched) 
samples of patients in a clinical trial to compare two drugs, A and B. 


Drug A 32 0.7 LO 2.3 24 
Drug B 14 26 54 45 1.7 


The sample means are z4 = 1.92, Tg = 3.12, and the pooled estimate 

of the common population variance is s% = 2.09. Carry out a 

two-sided t-test of the null hypothesis that the two drugs produce the 

same mean effects. [5] 


For Examiner's use only: 
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Question 37 


The statements listed in the table below are to be included in a 

statistical report. For each statement, name the section of the report 

in which it should feature (A: Introduction, B: Methods, C: Results or 

D: Discussion). Write your answers in the table. [3] 


Of 567 cars involved in a crash, 85 were red. 


The proportion of cars involved in a crash 
that were red was compared to the 
proportion of cars known to be on the road 
that are red using a chi-squared 
goodness-of-fit test. 


The significance probability for the 
chi-squared goodness-of-fit test was 0.243. 


Version 16. 


The aim of the study was to investigate the 
effect of a car's colour on the risk of being 
involved in an accident. 


We conclude that red cars are no more 
likely to be involved in a crash than are cars 
of other colours. 


The results were analysed using Minitab EE 
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Question 38 


The following random sample of size four was drawn from a 
population with the continuous uniform distribution U (0,0), where 
0 is an unknown parameter. 


1.43 8.29 2.30 4.07 


(a) Write down the maximum likelihood estimate of 0. [1] 


(b) Explain why, in practice, an alternative estimator for the 
parameter Ó is often used. [1] 


Question 39 


The following sample of size three was drawn from a population that 
has a geometric distribution with unknown parameter p. 


3 1 9 


Write down the likelihood of p for this sample, and simplify your 
expression. [3] 
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Question 40 


Figure 4 represents data on the net disposable income in thousands of 
euros per head of population (x) and the number of television sets per 
1000 people (y) for eleven countries of western Europe. 


Television sets (per 1000 people) 
400 5 


3504 e 
3004 e e 


2504 * 


200 4 : 





150 (d T T T T T g 
2 4 6 8 10 12 
Disposable income (000 euros per person) 





Figure 4 


A linear regression model is used to model the relationship between 
the two variables. The equation of the least squares line for these 
data is 


y = 133.56 + 21.652. 


(a) According to the fitted model, what would be the effect on the 
number of television sets per 1000 people of increasing a country’s 
net disposable income by 1000 euros? [1] 


(b) For these data, Sze = Y (x; — T)? = 82.01 and s? = 1072.0. 
Obtain a 95% confidence interval for the slope parameter ( of the 
regression line. [4] 
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Question 41 


When fitting a linear regression model, what plots would you obtain 
in order to check that the assumptions of the linear regression model 
are satisfied? Say what you would look for in each plot. [3] 


Question 42 


(a) Draw a rough scatterplot to illustrate a sample of data with two 
variables for which the Spearman correlation is 1 but the Pearson 
correlation is less than 1. [1] 


(b) If the Pearson correlation is 1, must the Spearman correlation 
also be 1? Explain your answer. [1] 
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Question 43 


Thirty passengers on a flight from Los Angeles to London took part 
in an experiment to investigate whether a new drug suppresses jet 
lag. The subjects were divided into two groups; one was given the 
course of treatment, and the other was given a placebo (an inactive 
substance). The data from the experiment are in the following table. 


Results of an experiment 


Jet lag No jet lag Row total 


Treatment group 3 12 15 
Placebo group 10 5 15 


Column total 13 17 30 


A chi-squared test is to be used with the data in the table to 
investigate whether the treatment suppressed jet lag. 


(a) State the null hypothesis of the test. [1] 


(b) Calculate the expected frequency for each cell, and hence find the 
value of the test statistic. [2] 
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(c) What can you say about the p value for these data? [2] 


(d) Report your conclusions. [2] 


For Examiner's use only: 
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